159 research outputs found
An extensive empirical study of collocation extraction methods
This paper presents a status quo of an ongoing research study of collocations â an essential linguistic phenomenon having a wide spectrum of applications in the field of natural language processing. The core of the work is an empirical evaluation of a comprehensive list of automatic collocation extraction methods using precision-recall measures and a proposal of a new approach integrating multiple basic methods and statistical classification. We demonstrate that combining multiple independent techniques leads to a significant performance improvement in comparisonwith individualbasic methods. 1 Introduction an
An augmented three-pass system combination framework: DCU combination system for WMT 2010
This paper describes the augmented threepass
system combination framework of
the Dublin City University (DCU) MT
group for the WMT 2010 system combination
task. The basic three-pass framework
includes building individual confusion
networks (CNs), a super network, and
a modified Minimum Bayes-risk (mCon-
MBR) decoder. The augmented parts for
WMT2010 tasks include 1) a rescoring
component which is used to re-rank the
N-best lists generated from the individual
CNs and the super network, 2) a new hypothesis
alignment metric â TERp â that
is used to carry out English-targeted hypothesis
alignment, and 3) more different
backbone-based CNs which are employed
to increase the diversity of the
mConMBR decoding phase. We took
part in the combination tasks of Englishto-
Czech and French-to-English. Experimental
results show that our proposed
combination framework achieved 2.17 absolute
points (13.36 relative points) and
1.52 absolute points (5.37 relative points)
in terms of BLEU score on English-to-
Czech and French-to-English tasks respectively
than the best single system. We
also achieved better performance on human
evaluation
Towards a user-friendly webservice architecture for statistical machine translation in the PANACEA project
This paper presents a webservice architecture for Statistical Machine Translation aimed at non-technical users. A workďŹow editor allows a user to combine different
webservices using a graphical user interface. In the current state of this project, the webservices have been implemented
for a range of sentential and sub-sentential aligners. The advantage of a common interface and a common data format allows the user to build workďŹows exchanging different aligners
Adapting SMT Query Translation Reranker to New Languages in Cross-Lingual Information Retrieval
We investigate adaptation of a supervised machine learning model for reranking of query translations to new languages in the context of cross-lingual information retrieval. The model is trained to rerank multiple translations produced by a statistical machine translation system and optimize retrieval quality. The model features do not depend on the source language and thus allow the model to be trained on query translations coming from multiple languages. In this paper, we explore how this affects the final retrieval quality. The experiments are conducted on medical-domain test collection in English and multilingual queries (in Czech, German, French) from the CLEF eHealth Lab series 2013--2015.
We adapt our method to allow reranking of query translations for four new languages (Spanish, Hungarian, Polish, Swedish). The baseline approach, where a single model is trained for each source language on query translations from that language, is compared with a model co-trained on translations from the three original languages
Adaptation of Machine Translation to Specific Domains and Applications
Matematicko-fyzikĂĄlnĂ fakult
Towards using web-crawled data for domain adaptation in statistical machine translation
This paper reports on the ongoing work focused on domain adaptation of statistical machine translation using domain-speciďŹc data obtained by domain-focused web crawling. We present a strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase--based machine translation framework. The proposed approach is evaluated on the domains of Natural Environment and Labour Legislation and two language
pairs: EnglishâFrench and EnglishâGreek
MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service
We present a web service which handles and distributes JSON-encoded HTTP
requests for machine translation (MT) among multiple machines running
an MT system, including text pre- and post processing.
It is currently used to provide MT between several languages
for cross-lingual information retrieval in the Khresmoi project.
The software consists of an application server and remote workers which handle
text processing and communicate translation requests to MT
systems. The communication between the application server and the workers is
based on the XML-RPC protocol. We present
the overall design of the software and test results which document
speed and scalability of our solution.
Our software is licensed under the Apache 2.0 licence and is available for
download from the Lindat-Clarin repository and Github
CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation Tasks
Neural sequence to sequence learning recently became a very promising
paradigm in machine translation, achieving competitive results with statistical
phrase-based systems. In this system description paper, we attempt to utilize
several recently published methods used for neural sequential learning in order
to build systems for WMT 2016 shared tasks of Automatic Post-Editing and
Multimodal Machine Translation.Comment: Accepted to the First Conference of Machine Translation (WMT16
- âŚ